[EPLB] Optimize EPLB with numpy#29499
Conversation
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request introduces significant optimizations to the EPLB algorithm by leveraging NumPy for performance gains, reducing GPU-CPU data transfers, and adding support for grouped layer weight exchanges. The introduction of preserve_intragpu_slots is a clever optimization to minimize unnecessary memory copies within the same GPU. The config processing bug fix is also a welcome improvement. Overall, the changes are well-implemented and the added tests provide good coverage for the new functionality. I have one suggestion to further optimize the local weight copy logic in move_to_buffer.
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
LGTM. |
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
Hi @ilmarkov, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
SageMoore
left a comment
There was a problem hiding this comment.
I'm still going through the code, but here's another round of comments.
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
Hi @ilmarkov, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: ilmarkov <markovilya197@gmail.com>
SageMoore
left a comment
There was a problem hiding this comment.
Ok I think this is getting pretty close. Given the complex nature of this PR, it would be good to include lm_eval runs for both Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 and deepseek-ai/DeepSeek-V2-Lite with sync and async EPLB.
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
tlrmchlsmth
left a comment
There was a problem hiding this comment.
I am seeing a failure when running test_eplb_execute - @ilmarkov could you please take a look?
File "/home/tms/vllm/tests/distributed/test_eplb_execute.py", line 308, in _test_async_transfer_layer_without_mtp_worker
move_from_buffer(
TypeError: move_from_buffer() got an unexpected keyword argument 'ep_group'
[rank0]:[W106 22:51:45.365432802 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
=============================================== warnings summary ================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
/home/tms/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
ref_error: type[Exception] = jsonschema.RefResolutionError,
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================ short test summary info ============================================
FAILED tests/distributed/test_eplb_execute.py::test_async_transfer_layer_without_mtp[2-2-2-3] - AssertionError
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
This PR adds following optimizations in EPLB algorithm which are applicable to sync and async mode:
move_from_buffer.get_ep_ranks_with_expert,move_to_buffer,move_from_bufferin numpy which gives 20-25% on average for each primitivepreserve_intragpu_slotsprimitive that does post processing rebalance algo results. It ensures that the experts that are assigned to be moved within the same gpu, stay in their slots. It helps avoid unnecessary gpu memcopies.Adds
log_balancedness_intervalconfig parameter to tune the frequency of balancedness logs and possible reduce number of collectives required for this logging.Purpose
Improves performance of sync and async eplb. Fixes bug in EPLB config post processing.
Test Plan
Add tests for grouped layer weight exchange, tests for
preserve_intragpu_slots.Validation Result
Client
lm_eval --model local-completions --tasks gsm8k --model_args model=$model,base_url=http://0.0.0.0:$port/v1/completions,num_concurrent=50,max_retries=3,tokenized_requests=FalseMain
Qwen-30B-FP8 DP=8
Qwen-30B-FP8 DP=8 AsyncEPLB=True
Qwen-30B-FP8 DP=8 AsyncEPLB=False
Main
DeepSeek-V2-Lite DP=4
vllm serve deepseek-ai/DeepSeek-V2-Lite --disable-log-requests --no-enable-prefix-caching -tp 1 -dp 4 --enable-eplb --eplb-config.window_size 128 --eplb-config.step_interval 512 --eplb-config.num_redundant_experts 16 --eplb-config.use_async trueDeepSeek-V2-Lite DP=4 AsyncEPLB=True
DeepSeek-V2-Lite DP=4 AsyncEPLB=False
Benchmark results
Server:
We compare sync versions of EPLB. Profile the third EPLB call, to get more stable version.
In profile logs
Main: eplb_state.py: rearrange takes 1s370ms
PR: eplb_state.py: rearrange takes 1s77ms. ~27% speedup
Profile logs of move_to_buffer and move_from_buffer
Main:
PR:

Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.